Goto

Collaborating Authors

 serverless platform



A Mean-Field Game Approach to Cloud Resource Management with Function Approximation

Neural Information Processing Systems

Reinforcement learning (RL) has gained increasing popularity for resource management in cloud services such as serverless computing. As self-interested users compete for shared resources in a cluster, the multi-tenancy nature of serverless platforms necessitates multi-agent reinforcement learning (MARL) solutions, which often suffer from severe scalability issues. In this paper, we propose a mean-field game (MFG) approach to cloud resource management that is scalable to a large number of users and applications and incorporates function approximation to deal with the large state-action spaces in real-world serverless platforms. Specifically, we present an online natural actor-critic algorithm for learning in MFGs compatible with various forms of function approximation. We theoretically establish its finite-time convergence to the regularized Nash equilibrium under linear function approximation and softmax parameterization. We further implement our algorithm using both linear and neural-network function approximations, and evaluate our solution on an open-source serverless platform, OpenWhisk, with real-world workloads from production traces. Experimental results demonstrate that our approach is scalable to a large number of users and significantly outperforms various baselines in terms of function latency and resource utilization efficiency.



A Mean-Field Game Approach to Cloud Resource Management with Function Approximation

Neural Information Processing Systems

Reinforcement learning (RL) has gained increasing popularity for resource management in cloud services such as serverless computing. As self-interested users compete for shared resources in a cluster, the multi-tenancy nature of serverless platforms necessitates multi-agent reinforcement learning (MARL) solutions, which often suffer from severe scalability issues. In this paper, we propose a mean-field game (MFG) approach to cloud resource management that is scalable to a large number of users and applications and incorporates function approximation to deal with the large state-action spaces in real-world serverless platforms. Specifically, we present an online natural actor-critic algorithm for learning in MFGs compatible with various forms of function approximation. We theoretically establish its finite-time convergence to the regularized Nash equilibrium under linear function approximation and softmax parameterization.


Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

Liu, Mengfan, Wang, Wei, Wu, Chuan

arXiv.org Artificial Intelligence

With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large models nowadays, with parallel expert networks. Serving large MoE models on serverless computing is potentially beneficial, but has been underexplored due to substantial challenges in handling the skewed expert popularity and scatter-gather communication bottleneck in MoE model execution, for cost-efficient serverless MoE deployment and performance guarantee. We study optimized MoE model deployment and distributed inference serving on a serverless platform, that effectively predict expert selection, pipeline communication with model execution, and minimize the overall billed cost of serving MoE models. Especially, we propose a Bayesian optimization framework with multi-dimensional epsilon-greedy search to learn expert selections and optimal MoE deployment achieving optimal billed cost, including: 1) a Bayesian decision-making method for predicting expert popularity; 2) flexibly pipelined scatter-gather communication; and 3) an optimal model deployment algorithm for distributed MoE serving. Extensive experiments on AWS Lambda show that our designs reduce the billed cost of all MoE layers by at least 75.67% compared to CPU clusters while maintaining satisfactory inference throughput. As compared to LambdaML in serverless computing, our designs achieves 43.41% lower cost with a throughput decrease of at most 18.76%.


Transactions and Serverless are Made for Each Other

Communications of the ACM

Serverless cloud offerings are becoming increasingly popular for stateless applications because they simplify cloud deployment. This article argues that if serverless platforms could wrap functions in database transactions, they would also be a good fit for database-backed applications. There are two unique benefits of such a transactional serverless platform: time-travel debugging of past events and reliable program execution with "exactly-once" semantics. Serverless cloud platforms such as Amazon Web Services (AWS) Lambda and Azure Functions are increasingly popular for building production applications as varied as website front ends, machine-learning (ML) pipelines, and image-processing systems. These platforms radically simplify development by managing application deployment.


AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments

Schuler, Lucia, Jamil, Somaya, Kühl, Niklas

arXiv.org Artificial Intelligence

Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider side, flexible resource management is required to meet fluctuating demand. It can be enabled through automated provisioning and deprovisioning of resources. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling, where a designated algorithm scales instances according to the number of incoming requests. In the recently evolving serverless framework Knative a request-based policy is proposed, where the algorithm scales resources by a configured maximum number of requests that can be processed in parallel per instance, the so-called concurrency. As we show in a baseline experiment, this predefined concurrency level can strongly influence the performance of a serverless application. However, identifying the concurrency configuration that yields the highest possible quality of service is a challenging task due to various factors, e.g. varying workload and complex infrastructure characteristics, influencing throughput and latency. While there has been considerable research into intelligent techniques for optimizing auto-scaling for virtual machine provisioning, this topic has not yet been discussed in the area of serverless computing. For this reason, we investigate the applicability of a reinforcement learning approach, which has been proven on dynamic virtual machine provisioning, to request-based auto-scaling in a serverless framework. Our results show that within a limited number of iterations our proposed model learns an effective scaling policy per workload, improving the performance compared to the default auto-scaling configuration.


Finding photos on Twitter using face recognition with TensorFlow.js

#artificialintelligence

As a developer advocate, I spend a lot of time at developer conferences (talking about serverless). Upon returning from each trip, I need to compile a "trip report" on the event for my bosses. This helps demonstrate the value in attending events and that I'm not just accruing air miles and hotel points for fun… I always include any social media content people post about my talks in the trip report. This is usually tweets with photos of me on stage. If people are tweeting about your session, I assume they enjoyed it and wanted to share with their followers.


How to run Tensorflow.js on a serverless platform : deploying models

#artificialintelligence

This is the last part of a 3 articles serie. In the first part, we introduced neural networks and TensorFlow framework basics. In the second part, we explained how to convert existing models from Python to TensorFlow.js Finally, we present today, through an example, how to use an online TensorFlow.js model and deploy it rapidly using our WarpJS JavaScript Serverless Function-as-a-Service (FaaS). Many public models can be retrieved from web databases.


The Rise of Serverless Computing

Communications of the ACM

Cloud computing in general, and Infrastructure-as-a-Service (IaaS) in particular, have become widely accepted and adopted paradigms for computing with the offerings of virtual machines (VM) on demand. By 2020, 67% of enterprise IT infrastructure and software spending will be for cloud-based offerings.16 A major factor in the increased adoption of the cloud by enterprise IT was its pay-as-you-go model where a customer pays only for resources leased from the cloud provider and have the ability to get as many resources as needed with no up-front cost (elasticity).2 Unfortunately, the burden of scaling was left for developers and system designers that typically used overprovisioning techniques to handle sudden surges in service requests. Studies of reported usage of cloud resources in datacenters19 show a substantial gap between the resources that cloud customers allocate and pay for (leasing VMs), and actual resource utilization (CPU, memory, and so on). Serverless computing is emerging as a new and compelling paradigm for the deployment of cloud applications, largely due to the recent shift of enterprise application architectures to containers and microservices.23 Using serverless gives pay-as-you-go without additional work to start and stop server and is closer to original expectations for cloud computing to be treated like as a utility.2 Developers using serverless computing can get cost savings and scalability without needing to havea high level of cloud computing expertise that is time-consuming to acquire. Due to its simplicity and economical advantages, serverless computing is gaining popularity as reported by the increasing rate of the "serverless" search term by Google Trends. Its market size is estimated to grow to 7.72 billion by 2021.10 Most prominent cloud providers including Amazon, IBM, Microsoft, Google, and others have already released serverless computing capabilities with several additional open source efforts driven by both industry and academic institutions (for example, see CNCF Serverless Cloud Native Landscapea).